Linguistic Resources for Handwriting Recognition and Translation Evaluation
نویسندگان
چکیده
We describe efforts to create corpora to support development and evaluation of handwriting recognition and translation technology. LDC has developed a stable pipeline and infrastructures for collecting and annotating handwriting linguistic resources to support the evaluation of MADCAT and OpenHaRT. We collect handwritten samples of pre-processed Arabic and Chinese data that has been already translated in English that is used in the GALE program. To date, LDC has recruited more than 600 scribes and collected, annotated and released more than 225,000 handwriting images. Most linguistic resources created for these programs will be made available to the larger research community by publishing in LDC’s catalog. The phase 1 MADCAT corpus is now available.
منابع مشابه
Off-line Arabic Handwritten Recognition Using a Novel Hybrid HMM-DNN Model
In order to facilitate the entry of data into the computer and its digitalization, automatic recognition of printed texts and manuscripts is one of the considerable aid to many applications. Research on automatic document recognition started decades ago with the recognition of isolated digits and letters, and today, due to advancements in machine learning methods, efforts are being made to iden...
متن کاملThe UPV Handwriting Recognition and Translation System for OpenHaRT 2013
The NIST Open Handwriting Recognition and Translation Evaluation 2013 (NIST OpenHaRT’13) is a performance evaluation assessing technologies that transcribe and translate text in document images. This evaluation is focused on recognizing Arabic text images and translating them into English. A Handwriting Recognition and Translation system typically consists of a combination of two systems: a Tex...
متن کاملOpenhart 2013 Evaluation: Description of the Litis Handwriting Recognition System
In this paper, we present the Arabic handwriting recognition system that was submitted to the 2013 NIST Open Handwriting Recognition and Translation Evaluation (OpenHaRT 2013). Our baseline recognition system is based on Hidden Markov Models and we also propose a lattice-based framework to combine the outputs from several different recognition engines. Keywords—Document recognition, Arabic hand...
متن کاملOn the Effects of Linguistic, Verbal, and Visual Mnemonics on Idioms Learning
Finding more effective ways of teaching second language idioms has been a long standing concern of many teaching practitioners and researchers. This study was an endeavorto explore the effects of three linguistic mnemonic devices (etymological elaboration, keyword method, and translation) on EFL learners’ recognition and recall of English idioms. To achieve the purpose of the study, ninety male...
متن کاملتشخیص دستنوشتۀ برخط فارسی با استفاده از مدل زبانی و کاهش قوانین نگارش کاربر
The Joint-up, cursive form of Persian words and immense variety of its scripts, also different figures of Persian letters depending on their sitting positions in the words, have turned the Persian handwritings recognition to an intense challenge. The major obstacle of the most often recognition ways, is their inattention to sentence contexture which causes utilizing of a word with correct appea...
متن کامل